International Journal of Research in Computer Science 
elSSN 2249-8265 Volume 4 Issue 2 (2014) pp. 19-24 
www.ijorcs.org, A Unit of White Globe Publications 
doi: 10.7815/ijorcs.42.2014.081 



FPGA IMPLEMENTATION OF FIR FILTER USING 
VARIOUS ALGORITHMS: A RETROSPECTIVE 

Jinalkumari K. Dhobi 1 , Dr. Y. B. Shukla 2 , Dr. K.R.Bhatt 3 

! P.G. Student, EC Dept., SVIT, Vasad 
Email: jinal dhobi@gmail. com 

2 Assoc. Prof., EC Dept., SVIT, Vasad 
Email: ybshukla2003@gmail.com 

Assoc. Prof., EC Dept., SVIT, Vasad 
Email: krbhattec@gmail.com 



Abstract: This manuscript is a thorough study of 
FPGA implementation of Finite Impulse response 
(FIR) with low cost and high performance. The key 
observation of this paper is an elaborate analysis 
about hardware implementations of FIR filters using 
different algorithm i.e., Distributed Arithmetic (DA), 
DA-Offset Binary Coding (DA-OBC), Common Sub- 
expression Elimination (CSE) and sum-of-power-of- 
two (SOPOT) with less resources and without 
affecting the performance of the original FIR Filter. 
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I. INTRODUCTION 

A digital filter is a system that carry out 
mathematical operation on a sampled of discrete time 
signal to modify or alter the component of the signal in 
time or frequency domain. It consist of an analog-to- 
digital converter at front end, followed by a 
microprocessor and some peripheral component i.e., 
memory to store data & filter coefficients. At the 
backend side a digital-to-analog converter is used to 
complete the output stage. For a real time applications, 
an FPGA or ASIC or Specialized DSP with parallel 
architecture is used instead of a general purpose 
microprocessor. Digital filters can be implemented in 
two ways, by convolution (FIR) and by recursion (IIR) 
[1]. A FIR filter has a number of useful properties 
compared to an IIR filter i.e. inherently stable, no 
feedback require, designed to be linear phase. With 
recent trend towards portable computing and wireless 
communication systems, power consumption has been 
an important design consideration [2] so the field 
programmable gate array (FPGA) is an alternative 
solution for realization of digital signal processing 
task. 

This paper is organized as follows: section II 
introduced design method of the FIR filter, section III 
describes the various algorithms to implement the FIR 
filter on FPGA, Section IV discusses the Comparison 
of algorithms and finally the conclusion is presented in 
section V. 



II. DESIGN METHODS 

FIR filters are used where exact linear phase 
response is required. It is non-recursive filter, consists 
of two parts one is Approximation problem and second 
one is Realization problem. The steps of 
approximation are, first take the Ideal frequency 
response after that choose the Class of filter & Quality 
of approximation and finally select the Method to find 
the filter transfer function. The realization part select 
the structure to implement the transfer function. There 
are mainly three well-known methods for designing 
FIR filter namely the window method, Frequency 
sampling technique and Optimal filter design method. 
Among these three methods, window method is simple 
and efficient way to design an FIR filter [3]. In this 
method, first start with the ideal desired frequency 
response H d (V w ) of the specified filter then Compute 
the inverse DTFT of H d (V w ) i.e., h d (n) which is 
infinite in duration after that Choose an appropriate 
window function w(n) and calculate the impulse 
response h(n) of specified filter as h(n) = 

h d (n) * w(n) to truncated at some point n=M-l. Once 
h(n) is determined, it's DTFT H(e jw ) and Z- 
transform H(z) can be calculated for any further 
analysis. For truncation in third step of h d (n) to M- 
terms, direct truncation method using rectangular 
window w(n) is multiplied with h d (n) which is 
infinite in nature but rectangular window contain sharp 
discontinuity which leads to Gibbs Phenomenon effect 
[4]. In order to reduce the ripples, instead of sharp 
discontinuity function, choose a window function 
having taper and decays toward zero gradually [4]. 
Some of window [5] commonly used are as followed: 
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(Rectangular, Hanning, Hamming and 
Blackman) 

/2p(n + 1)\ /4p(n + 1) 

w(n) — a — bcos | — - , ; — | + ccos 



N + l 



N + 1 



Where n = OtoN-l 
....(2) 

3 . Kaiser window with parameter 0: 
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....(3) 
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In Rectangular window, due to the direct 
truncation of h d (n) leads to the Gibbs phenomenon 
effect which apparent itself as a fixed percentage 
overshoot and ripple before and after an approximated 
discontinuity in the frequency response due to the non- 
uniform convergence of the Fourier series at the 
discontinuity [5]. According to the simulation result 
[5] obtained from FDAtool, the Bartlett window 
reduced the overshoot in the designed filter but spreads 
the transition region. The generalized cosine window, 
Hanning, Hamming and Blackman is complicated but 
provide a smooth truncation of ideal impulse response 
and a frequency response. The best window method is 
Kaiser Window because it has the shape parameter (3 
which depending on the filter taps M was used to 
adjust the main lobe width and side lobe attenuation, 
choosing M can produce a variety of transition band 
and optimal stopband attenuation [6]. When the 
transition band Aco(rad) and the stopband attenuation 
82 = —20 log 10 a2(dB) were given, the Kaiser FIR 
filter taps M and the shape parameter (3 can be 
obtained from equation (4) and (5) [6]. 

III. FPGA IMPLEMENTATION USING VARIOUS 
ALGORITHMS 

The application of digital filter are widespread and 
include but are not limited to the Communication 
System, Audio System, Instrumentation, Image- 
Processing and enhancement, Speech synthesis. It is 
nowadays convenient to consider computer programs 
and digital hardware that can perform digital filtering 
as two different implementations of digital filters, 
namely, Software digital filters, and Hardware digital 
filters. Software digital filters can be implemented in 
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terms of a high-level language, such as C++ or 
MATLAB, on a personal computer or workstation or 
by using a low-level language on a general-purpose 
digital signal processing chip. Hardware digital filter 
can be designed using a number of highly specialized 
interconnected VLSI chips like DSP, ASIC and FPGA. 
Software digital filter have no counterpart in the 
analog world and therefore, for non-real-time 
application they are the only choice. ASIC based 
Implementation of FIR Filter provides little costlier 
and long development time but gives high flexibility. 
The realization of FIR filter based on FPGA received 
extensive attention because it gives high flexibility, 
high performance, low cost and shorter development 
time [8]. The core of the FIR filter implementation is 
multiplication and accumulation (MAC) operation. 
The design method of MAC can be define using 
various algorithm and techniques. One is general direct 
MAC structure, which are expensive in hardware 
because of logic complexity and area usage. The others 
are Distributed Arithmetic (DA) [7], DA-Offset Binary 
Coding (DA-OBC) [9], Common Subexpression 
Elimination (CSE) [10] and Sum-of-power-of-two 
(SOPOT) [11] which reduced the required number of 
multiplier and adders. 

A. Distributed arithmetic and DA-OBC 

Distributed Arithmetic is a famous method, which 
converts calculation of MAC to a serial of look up 
table accesses and summation [8]. It was initially 
proposed by Croisier in 1973 [7] and further developed 
by Peled and Lui [8]. The principle of DA algorithm is 
as follows: 

An FIR filter of N order is shown as Eq. (6) 



yin) =Ylk=lh k x k {n) 



....(6) 



Where /i^the set of constant coefficient of the filter 
is, y(n) is the output data, x k (n) is the input data, 
which can be expressed as Eq. (7) using m-bit 2's 
complementary binary number. 



x k — —x k>m _i + YjLi x km _ 1 _j 2 ] .... (7) 

Where x k m _i is the most significant bit of x k , Eq. 
(8) can be derived when x k of Eq. (7) is substitute into 
Eq. (6). 
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7=1 k=0 



k=0 



x k,m—l 



..(8) 



In Eq. (8) the calculation result of 
Ylk=l hk Xk )7n -\-j have 2 N kinds of different results 
because of the value of x km _x-j is 0 or 1. All the 
possible results of 2^=o h k x k,m-i-j are constructed 
in advance and stored into one LUT. The contents of 
LUT can be fetch by using x km _x-j obtain from shift 
register unit as a LUT address signal. So the value of 
y(n) calculated by shifting and accumulating 
operation. 

Table 1: LUT contents of DA for 4-tap FIR filter 
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They define d k>J - as follows, the value of d k>J - is -1 or +1. 
d = ( ( x kj ~ *kj)> j*m-l 

>] l - ( x /c,m-l ~~ x k,m-l)> j = m—l 

....(10) 

Eq. (9) can be simplified as Eq. (11). 



m—l 



X k 



= ^[^d fe , m -i-;2^-2-(-- 1 )] 

....(11) 



7=0 



Address signal (b 3 b 2 b 1 b () ) 


LUT content 


0000 


0 


0001 


h 0 


0010 


hy 


0011 


hy + h 0 


0100 


h 2 


0101 


h 2 + h 0 


0110 


h 2 + hy 


0111 


h 2 + hy 4- h 0 


1000 


h 3 


1001 


h 3 + h 0 


1010 


h 3 +h 


1011 


h 3 +h 1 + h 0 


1100 


h 3 + h 2 


1101 


h 3 +h 2 + h 0 


1110 


h 3 +h 2 + h r 


1111 


h 3 + h 2 + h x + ho 



Hence, DA algorithm reduced the logic resources 
by converting complex MAC operation to a simple 
look-up table access and summations. The main 
disadvantage of DA algorithm is the size of LUT is 
large and also capacity of LUT will exponentially 
increase with the order of the filter. Bo Hong, et al [9] 
proposed a new algorithm which can reduced the 
capacity of LUT by half through the use of offset 
binary coding. The principle of DA-OBC algorithm is 
as follows [12]: 



x k can also be expressed as Eq. (9) 
1 

Xk = ^t** ~(~ x k)] 

1 

= 2 [~{ X k,m-l ~ x k,m-l)] 
m—l 
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Put Eq. (1 1) in to Eq. (6), and get 

m-l WV-l \ 7V-1 
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For convenience, 

7V-1 



;=0 
....(12) 



N-l ^^V-l 

Dj - ^ -h k d k m _ x _j , D ex = — - ^ hj 

k=0 7=0 

....(13) 

The value of Dj have 2 N kinds of different result 
which can be pre-computed and stored in a LUT and 
the LUT can be reduced by half [9] because the 
contents of LUT which are stored in addresses, are l's 
complement of each other is in same magnitude with 
the reverse sign. 

Table 2: LUT contents of DA-OBC for 4-tap FIR filter 
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x 0j = 0 
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B. Common Subexpression Elimination (CSE) 

M. Thenmozhi, et al [10] proposed an algorithm 
which reduce the complexity is Common 
subexpression elimination (CSE). In this algorithm, the 
coefficient is based on canonical signed digit (CSD), 
which minimize the number of adder/subtractor used 
in each coefficient multiplier. The aim of CSE 
algorithm is to identify multiple event of identical bit 
patterns present in coefficients to remove the 
redundant multiplications which results in considerable 
reduction of adders as well as the complexity of FIR 
filter compared to the conventional implementation. 
This algorithm using binary representation of 
coefficients for the implementation of higher order FIR 
filter with less number of adder than CSD based CSE 
algorithm. The Binary CSE (BCSE) algorithm 
technique focuses on reducing redundant computations 
in coefficient multipliers by reusing the most common 
binary bit patterns (BCSs) presents in coefficients [13]. 
In n-bit binary number, the number of BCSs is2 n — 
(n + 1). 



For example: A 
form four BCSs. 



3 bit binary representation can 



[Oil] =x 2 = 

[10 1] =x 3 = 

[1 1 0] = * 4 = 

[111] =x s = 



2 _1 x + 2~ 2 x 

x + 2~ 2 x 

x + 2 _1 x 

x + 2 _1 x + 2" 2 ; 



Where x is the input signal. The other BCSs such 
as [0 0 1], [0 1 0] and [1 1 0] do not required any 
adder for implementation because they have only one 
nonzero bit. From above four BCSs they conclude that 
x 2 can be obtained by right shift operation without 
using any extra adders and also x 5 can be obtained 
from x 4 using an adder. 

C Sum - of-p o wer- of -two (SOPOT) 

Catalin Damian, et al [11] proposed a high speed 
and low area architecture for the implementation of 
FIR filter without any multiplication block. 
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Figure 1: Direct FIR filter 



Figure 2. describe the Architecture unit of FIR 
filter with SOPOT type coefficient. In this algorithm 
the coefficient values are integer's power-of-two or 
sum-of- power-of-two (SOPOT) with two or three 
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terms and the multipliers can be replaced by shifters. 
In [15] and [16], FIR architectures presented based on 
theory of SOPOT of two terms. Because of error 
occurred in two terms the approximation of filter's 
coefficient is extended the algorithm to SOPOT with 
three terms. 




Figure 2. Architecture unit of FIR filter with SOPOT type 
coefficient 



m = z 



{=0 a iJ 



. 2 bi J 



....(13) 



Where ay = {-1,1} andfcy = {—t, ...,0, ...,u}; t 
and u determine the word length dynamic range of 
each filter coefficient. Larger the numbers t and u will 
gives the closer approximation to its original number. 

Computational algorithm: The Computational 
algorithm is shown in figure 3. The algorithm consist 
of two repetitive structure; one by i is to calculate 
a[i,j] and b[i,j] for the f[j] co-efficient. 
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Figure 3: SOPOT a i k and b i k terms calculation algorithm 
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Table 3: Example of Two and Three SOPOT Pioceedin^s of the 



Integer 


Two SOPOT 


Three SOPOT 


13 


Z" 3 + 2-12 (8%) 


o "3 i n2 i of) 1 o 

2^ + 2 Z 4- 2-13 
(0%) 


25 


2 4 + 2 3 =24 (4%) 


2 4 + 2 3 + 2°=25 
(0%) 


67 


2 6 + 2 X =66 (2%) 


2 6 + 2 1 + 2°=67 
(0%) 



IV. COMPARISION OF ALGORITHMS BASED ON 
SURVEY 

DA algorithm reduces the complexity of FIR filter 
by converting MAC operation to serial of look up 
table. CSE algorithm reduces the adder by just finding 
the multiple event and SOPOT is converting 
coefficient into power of two format and eliminate the 
complete Multiplication block. Among the three 
algorithm DA is easy to implement at the cost of 
storage resources. 

V. CONCLUSION 

The realization structure of FIR filter consists of a 
MAC operation which is made up of multipliers and 
adders. DA algorithm, it completely eliminate the 
multiplication block by converting complicated MAC 
operation to look-up table access and summation at the 
cost of increasing storage resource where DA-OBC 
based architecture will decreases the LUT size by half 
and make operation speed faster. In CSE algorithm, It 
eliminate the multipliers and adders by identifying the 
multiple event that have an identical bit pattern and 
SOPOT algorithm diminished the complexity of 
working task by completely abolishing the 
multiplication block by only adders and shifter. So 
According to our survey, DA structure is easy to 
implement on FPGA because of pre-calculated results 
are already stored in LUTs and in FPGA it is easy to 
design but the main drawback of this algorithm is that, 
it uses an extra resources. Where the SOPOT is used 
for low power and low area application as it uses 
shift/add multiplexer based multiplier. The CSE 
algorithm is used to find and eliminate most common 
event among filter co-efficient which results in power 
and area saving by reducing multiplier with a small 
number of adders while implemented in FIR filters. 
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